{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/mlop-ai/mlop/blob/main/examples/intro.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "<h1 align=\"center\" style=\"font-family: Inter, sans-serif; font-style: normal; font-weight: 700; font-size: 72px\">m:lop</h1>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KD6sAisV4ue2"
   },
   "source": [
    "<div class=\"markdown-google-sans\">\n",
    "  <h1><strong>Welcome</strong></h1>\n",
    "</div>\n",
    "This is an interactive notebook to help you get started with the basic capabilities of the logger. Buckle up!\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "Install **mlop**'s [Python SDK](https://github.com/mlop-ai/mlop), then proceed with the login"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "q-epCw4DBQd5",
    "outputId": "3a200681-43ac-4d40-ab2f-d511e41f606e"
   },
   "outputs": [],
   "source": [
    "%pip install -Uq \"mlop[full]\"\n",
    "# %pip install \"mlop[full] @ git+https://github.com/mlop-ai/mlop.git\"\n",
    "# import sys; import os; sys.path.insert(0, os.path.dirname(os.path.abspath(os.path.dirname(\"__file__\"))))\n",
    "import mlop\n",
    "\n",
    "mlop.login()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "UqZgDpoJ4ue5"
   },
   "source": [
    "## Simulate and track an experiment with **mlop**\n",
    "\n",
    "Create, track, and visualize an experiment:\n",
    "\n",
    "0. Set up the hyperparameters to be tracked\n",
    "\n",
    "1. Initialize and name the new **mlop** run\n",
    "\n",
    "2. Log metrics such as the accuruacy and loss within the training loop\n",
    "\n",
    "3. Gracefully exit the run if everything completed successfully"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "YJ1KcEHJ4ue5",
    "outputId": "cf3c96b6-4b7e-4cf0-d587-577e2496945e"
   },
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "# 0. set up hyperparameters\n",
    "config = {\n",
    "    \"learning_rate\": 0.02,\n",
    "    \"architecture\": \"CNN\",\n",
    "    \"dataset\": \"CIFAR-100\",\n",
    "    \"epochs\": 10,\n",
    "}\n",
    "\n",
    "# 1. initialize a run\n",
    "op = mlop.init(\n",
    "    project=\"example\",\n",
    "    name=\"simulation-standard\", # will be auto-generated if left unspecified\n",
    "    config=config,\n",
    ")\n",
    "\n",
    "# simulated training loop\n",
    "epochs = 10\n",
    "offset = random.random() / 5\n",
    "for epoch in range(1, epochs+1):\n",
    "    acc = 1 - 2**-epoch - random.random() / epoch - offset\n",
    "    loss = 2**-epoch + random.random() / epoch + offset\n",
    "\n",
    "    # 2️. record metrics from the script to mlop\n",
    "    op.log({\"acc\": acc, \"loss\": loss})\n",
    "    print(f\"Epoch {epoch}/{epochs}\")\n",
    "\n",
    "# 3. mark the run as finished\n",
    "op.finish()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rzDj--tUBQd5"
   },
   "source": [
    "## Simulate an ultra-high throughput ML experiment with **mlop**\n",
    "\n",
    "Compared to traditional experiment loggers, **mlop** also by default uses a high-throughput mode.\n",
    "\n",
    "This means there is little software limit in terms of how many data points you can log at a given time.\n",
    "\n",
    "Other experiment trackers often easily hit a rate limit due to their inherent architecture; **mlop** does not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "kQWPRVtZBQd6",
    "outputId": "5519b7d3-a7b7-42b2-ed0a-e88865230445"
   },
   "outputs": [],
   "source": [
    "import random\n",
    "import time\n",
    "\n",
    "config = {\n",
    "    \"epochs\": 100_000,\n",
    "    \"metrics\": 20,\n",
    "    \"wait\": 0.01\n",
    "}\n",
    "\n",
    "settings = mlop.Settings()\n",
    "settings.meta = []\n",
    "\n",
    "op = mlop.init(\n",
    "    project=\"example\",\n",
    "    name=\"simulation-fast\",\n",
    "    config=config,\n",
    "    settings=settings\n",
    ")\n",
    "\n",
    "start = time.time()\n",
    "for i in range(config['epochs']):\n",
    "    dummy_data = {f\"val/metric-{i}\": random.random() for i in range(config['metrics'])}\n",
    "    op.log(dummy_data)\n",
    "\n",
    "    if i % 10_000 == 0:\n",
    "        print(f\"Epoch {i + 1}/{config['epochs']}, sleeping {config['wait']}s\")\n",
    "        time.sleep(config['wait'])\n",
    "\n",
    "print(f\"Logged {int(config['epochs']*config['metrics'])} points in {time.time() - start:.2f}s\")\n",
    "op.finish()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "1_YmSyWR4ue6"
   },
   "source": [
    "Now that we know how to integrate **mlop** into a simulated ML training loop, let's track an actual ML experiment using a basic PyTorch neural network.\n",
    "\n",
    "##  Track an ML experiment with PyTorch\n",
    "\n",
    "The following code cell defines and trains a simple MNIST classifier. During training, you will see **mlop** prints out URLs. Click on the project page link to see your results stream in live to a **mlop** project.\n",
    "\n",
    "**mlop** automatically logs metrics, console output (both `stdout` and `stderr`), system information (optional), system resource usage (`cpu`, `gpu`, `memory`, `disk`, `network`, `processes`), as well as  hyperparameters (specified in `config`).\n",
    "\n",
    "You will be able to see an interactive graph with model inputs and outputs.\n",
    "\n",
    "### Set up PyTorch DataLoader\n",
    "The following cell defines some useful functions that we will need to train our ML model (these are not unique to the experiment tracker itself). See the PyTorch documentation for more information on how to define [forward and backward training loops](https://pytorch.org/tutorials/beginner/nn_tutorial.html), how to use [PyTorch DataLoaders](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) to load data in for training, and how to [define PyTorch models](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) using `torch.nn.Sequential`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "lsLNGc1r4ue6"
   },
   "outputs": [],
   "source": [
    "import numpy\n",
    "import torch, torchvision\n",
    "import torch.nn as nn\n",
    "from torchvision.datasets import MNIST\n",
    "import torchvision.transforms as T\n",
    "\n",
    "MNIST.mirrors = [\n",
    "    mirror for mirror in MNIST.mirrors if \"http://yann.lecun.com/\" not in mirror\n",
    "]\n",
    "\n",
    "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
    "\n",
    "\n",
    "def get_dataloader(is_train, batch_size, slice=5):\n",
    "    \"Get a training dataloader\"\n",
    "    full_dataset = MNIST(\n",
    "        root=\".\", train=is_train, transform=T.ToTensor(), download=True\n",
    "    )\n",
    "    sub_dataset = torch.utils.data.Subset(\n",
    "        full_dataset, indices=range(0, len(full_dataset), slice)\n",
    "    )\n",
    "    loader = torch.utils.data.DataLoader(\n",
    "        dataset=sub_dataset,\n",
    "        batch_size=batch_size,\n",
    "        shuffle=True if is_train else False,\n",
    "        pin_memory=True,\n",
    "        num_workers=2,\n",
    "    )\n",
    "    return loader\n",
    "\n",
    "\n",
    "def get_model(dropout):\n",
    "    \"A simple model\"\n",
    "    model = nn.Sequential(\n",
    "        nn.Flatten(),\n",
    "        nn.Linear(28 * 28, 256),\n",
    "        nn.BatchNorm1d(256),\n",
    "        nn.ReLU(),\n",
    "        nn.Dropout(dropout),\n",
    "        nn.Linear(256, 10),\n",
    "    ).to(device)\n",
    "    return model\n",
    "\n",
    "\n",
    "def validate_model(model, valid_dl, loss_func, log_images=True, batch_idx=0):\n",
    "    \"Compute performance of the model on the validation dataset and log images\"\n",
    "    model.eval()\n",
    "    val_loss = 0.0\n",
    "    with torch.inference_mode():\n",
    "        correct = 0\n",
    "        for i, (images, labels) in enumerate(valid_dl):\n",
    "            images, labels = images.to(device), labels.to(device)\n",
    "\n",
    "            # forward pass\n",
    "            outputs = model(images)\n",
    "            val_loss += loss_func(outputs, labels) * labels.size(0)\n",
    "\n",
    "            # compute accuracy and accumulate\n",
    "            _, predicted = torch.max(outputs.data, 1)\n",
    "            correct += (predicted == labels).sum().item()\n",
    "\n",
    "            # log one batch of images to the dashboard\n",
    "            if i == batch_idx and log_images:\n",
    "                j = 0\n",
    "                for img, pred, target in zip(\n",
    "                    images.to(\"cpu\"), predicted.to(\"cpu\"), labels.to(\"cpu\")\n",
    "                ):\n",
    "                    op.log({\"image\": mlop.Image(data = img[0].numpy() * 255, caption=f\"{j}_{i}\")})\n",
    "                    j += 1\n",
    "\n",
    "    return val_loss / len(valid_dl.dataset), correct / len(valid_dl.dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OVG0UiH14ue7"
   },
   "source": [
    "### Train your model\n",
    "\n",
    "The following code trains and saves model checkpoints to your project. Use model checkpoints like you normally would to assess how the model performed during training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "aPJF9yg64ue7",
    "outputId": "5857a612-d86c-4ebd-89c8-a1d9f732ef6e"
   },
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "config = {\n",
    "    \"epochs\": 5,\n",
    "    \"batch_size\": 128,\n",
    "    \"lr\": 1e-3,\n",
    "    \"dropout\": random.uniform(0.01, 0.80),\n",
    "}\n",
    "\n",
    "op = mlop.init(project=\"example\", name=\"torch-mnist-cnn\", config=config)\n",
    "\n",
    "train_dl = get_dataloader(is_train=True, batch_size=config[\"batch_size\"])\n",
    "valid_dl = get_dataloader(is_train=False, batch_size=2 * config[\"batch_size\"])\n",
    "n_steps_per_epoch = math.ceil(len(train_dl.dataset) / config[\"batch_size\"])\n",
    "\n",
    "# simple MLP model\n",
    "model = get_model(config[\"dropout\"])\n",
    "\n",
    "# loss and optimizer\n",
    "loss_func = nn.CrossEntropyLoss()\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=config[\"lr\"])\n",
    "\n",
    "# training loop\n",
    "example_ct = 0\n",
    "step_ct = 0\n",
    "for epoch in range(config[\"epochs\"]):\n",
    "    model.train()\n",
    "    for step, (images, labels) in enumerate(train_dl):\n",
    "        images, labels = images.to(device), labels.to(device)\n",
    "\n",
    "        outputs = model(images)\n",
    "        train_loss = loss_func(outputs, labels)\n",
    "        optimizer.zero_grad()\n",
    "        train_loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        example_ct += len(images)\n",
    "        metrics = {\n",
    "            \"train/train_loss\": train_loss,\n",
    "            \"train/epoch\": (step + 1 + (n_steps_per_epoch * epoch)) / n_steps_per_epoch,\n",
    "            \"train/example_ct\": example_ct,\n",
    "        }\n",
    "\n",
    "        if step + 1 < n_steps_per_epoch:\n",
    "            op.log(metrics)\n",
    "\n",
    "        step_ct += 1\n",
    "\n",
    "    val_loss, accuracy = validate_model(\n",
    "        model, valid_dl, loss_func, log_images=(epoch == (config[\"epochs\"] - 1))\n",
    "    )\n",
    "\n",
    "    val_metrics = {\"val/val_loss\": val_loss, \"val/val_accuracy\": accuracy}\n",
    "    op.log({**metrics, **val_metrics})\n",
    "\n",
    "    torch.save(model, \"my_model.pt\")\n",
    "    # op.log({\n",
    "    #     \"model/mnist\": mlop.File(path=\"./my_model.pt\", name=f\"epoch-{epoch+1}_dropout-{round(config['dropout'], 4)}\")\n",
    "    # })\n",
    "\n",
    "    print(\n",
    "        f\"Epoch: {epoch + 1}, Train Loss: {train_loss:.3f}, Validation Loss: {val_loss:3f}, Accuracy: {accuracy:.2f}\"\n",
    "    )\n",
    "\n",
    "mlop.finish()"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
